Data Representation
Subject: Computer Science
Topic: 4
Cambridge Code: 0478
Number Systems
Binary (Base 2)
Binary - Base 2 numbering system (digits 0-1)
Position values:
Example: 10110₂ = 1(16) + 0(8) + 1(4) + 1(2) + 0(1) = 22₁₀
Converting to binary:
- Divide by 2 repeatedly
- Remainders give binary digits
- Read from bottom to top
Hexadecimal (Base 16)
Hexadecimal - Base 16 (digits 0-9, A-F)
Digits: 0,1,2,3,4,5,6,7,8,9,A(10),B(11),C(12),D(13),E(14),F(15)
Position values:
Example: 2F₁₆ = 2(16) + 15(1) = 47₁₀
Advantages:
- Compact representation
- Easy conversion to binary
- Used for memory addresses, colors
Converting Between Systems
Binary ↔ Hexadecimal:
- 1 hex digit = 4 binary digits
- Group binary in fours
- Convert each group
Example: 10110011₂ = B3₁₆
- 1011₂ = B₁₆
- 0011₂ = 3₁₆
Data Units
Bit: Single binary digit (0 or 1)
Byte: 8 bits
Size Conversions
| Unit | Size |
|---|---|
| Kilobyte (KB) | 1,024 bytes |
| Megabyte (MB) | 1,024 KB |
| Gigabyte (GB) | 1,024 MB |
| Terabyte (TB) | 1,024 GB |
Note: Often abbreviated as 1024 ≈ 1000 in casual usage
Calculating Storage
Example: How many bits in 5 MB?
5 MB × 1024 KB/MB × 1024 bytes/KB × 8 bits/byte = 41,943,040 bits
Character Encoding
ASCII (American Standard Code for Information Exchange)
ASCII - 7-bit encoding (128 characters)
Ranges:
- 0-31: Control characters
- 32-47: Spaces, punctuation
- 48-57: Digits 0-9
- 65-90: Uppercase A-Z
- 97-122: Lowercase a-z
Example:
- 'A' = 65 = 01000001₂
- '0' = 48 = 00110000₂
Extended ASCII
Extended ASCII - 8-bit encoding (256 characters)
- Includes accented characters
- Special symbols
- Scientific characters
Unicode
Unicode - Universal character set
UTF-8: Variable-length (1-4 bytes)
- ASCII compatible
- Most common on web
UTF-16: Fixed 2-4 bytes
- Used in many applications
Advantages:
- Supports all languages
- Emojis and special characters
- Global compatibility
Image Representation
Bitmap (Raster) Images
Bitmap - Grid of colored pixels
Color representation:
- RGB: Red, Green, Blue (each 0-255)
- Example: 255,0,0 = Pure red
Color depth:
- 8-bit: 256 colors
- 16-bit: 65,536 colors
- 24-bit: 16.7 million colors
File size calculation:
Example: 100×100 pixels, 24-bit Size = 100 × 100 × 24 bits = 240,000 bits ≈ 30 KB
Vector Images
Vector - Mathematical descriptions of shapes
Advantages:
- Scalable without quality loss
- Smaller file sizes (simple shapes)
- Resolution independent
Disadvantages:
- Complex images not suitable
- Less photorealistic
Image Compression
Lossy compression:
- Removes data
- Smaller file size
- Quality degradation
- JPEG, MP4
Lossless compression:
- No data removal
- Larger file size
- Perfect restoration
- PNG, GIF, ZIP
Sound Representation
Sound Digitization
Sampling - Recording sound at intervals
Sampling rate: How often sound sampled
- CD quality: 44.1 kHz
- Professional: 48 kHz
- Telephony: 8 kHz
- Higher rate = better quality
Sample resolution (Bit depth):
- 8-bit: 256 volume levels
- 16-bit: 65,536 volume levels
- 24-bit: 16.7 million levels
- Higher = better quality
File size calculation:
Example: 44.1 kHz, 16-bit, 3 minutes Size = 44,100 × 180 × 16 = 127,008,000 bits ≈ 15.9 MB
Sound Compression
Lossy (MP3, AAC):
- Removes inaudible frequencies
- 10:1 compression ratio typical
- Acceptable quality loss
Lossless (FLAC, WAV):
- Preserves all data
- Larger files
- Perfect reproduction
Text Compression
Run-Length Encoding (RLE)
RLE - Replace repeated characters with count + character
Example: AAABBCDDD → 3A2B1C3D
Efficiency depending on data - Very effective for repetitive data
Dictionary Compression
Lempel-Ziv-Welch (LZW):
- Replaces repeating sequences with codes
- Adaptive dictionary
- ZIP files use this
Error Detection and Correction
Parity Bit
Parity - Extra bit for error detection
Even parity: Total 1s (including parity) = even Odd parity: Total 1s = odd
Example (even): 1011010 → 10110101 (add 1)
Checksum
Checksum - Sum of data bits modulo some value
- Added to end of data
- Receiver verifies by recalculating
- Detects transmission errors
Error Correcting Codes
Hamming code:
- Detects and corrects single-bit errors
- Multiple parity bits at specific positions
Key Points
- Binary: Base 2 (0-1)
- Hexadecimal: Base 16 (0-9, A-F)
- Data units: Bit, byte, KB, MB, GB, TB
- ASCII: 7-bit, 128 characters
- Unicode: Supports all languages
- Bitmap: Pixel-based, color depth matters
- Vector: Math-based, scalable
- Lossy compression removes data
- Lossless compression preserves all data
Practice Questions
- Convert binary ↔ decimal ↔ hexadecimal
- Calculate file sizes
- Explain character encodings
- Compare bitmap vs vector
- Calculate image/sound file sizes
- Apply RLE compression
- Detect parity errors
Revision Tips
- Practice number conversions
- Know data unit relationships
- Understand ASCII/Unicode
- Know color depth effects
- Understand sampling rate importance
- Compare compression types
- Calculate file sizes accurately